A Neural Oscillator Sound Separator for Missing Data Speech Recognition
نویسندگان
چکیده
In order to recognise speech in a background of other sounds, human listeners must solve two perceptual problems. First, the mixture of sounds reaching the ears must be parsed to recover a description of each acoustic source, a process termed ‘auditory scene analysis’. Second, recognition of speech must be robust even when the acoustic evidence is missing due to masking by other sounds. This paper describes an automatic speech recognition system that addresses both of these issues, by combining a neural oscillator model of auditory scene analysis with a framework for ‘missing data’ recognition of speech.
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSpeech Recognition with Missing Data using Recurrent Neural Nets
In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent N...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملSimultaneous Speech Recognition Based on Automatic Missing Feature Mask Generation by Integrating Sound Source Separation
Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech reco...
متن کاملOn the Relation between Statistical Properties of Spectrographic Masks and Recognition Accuracy
Missing Data Techniques (MDT) can significantly improve the accuracy of automatic speech recognition (ASR) for speech corrupted by background noise. The increase in recognition accuracy obtained using MDT is largely dependent on the estimation of spectrographic masks used to distinguish speech from noise. We present an analysis technique which enables us to compare two mask estimation technique...
متن کامل